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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: IMAI , Kensaku 

KITAJIMA, Masato 

(ii) TITLE OF INVENTION: METHOD AND APPARATUS 
AUTOMATICALLY 

REMOVING VECTOR UNIT IN DNA BASE SEQUENCE 

(iii) NUMBER OF SEQUENCES: 19 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Staas & Halsey 

(B) STREET: 700 Eleventh Street, N.W., Suite 500 

(C) CITY: Washington 

(D) STATE: DC 

(E) COUNTRY: US 

(F) ZIP: 20001 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.3 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 0 8/684,674 

(B) FILING DATE: 22-JUL-1996 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Herbert, William F. 

(B) REGISTRATION NUMBER: 31,024 

(C) REFERENCE/DOCKET NUMBER : 862.1335/WFH 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 2024341500 

(B) TELEFAX: 2024341501 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
AAGCTTGCAT GCCTGCAGGT CGACTCTAGA GGATCCCCGG GTACCGAGCT CGAATTC 57 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
TG<3iCTTGAA CGCATGCT 18 
(2 ^INFORMATION FOR SEQ ID NO: 3: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

f«f(ii) MOLECULE TYPE: DNA (genomic) 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
TGCACTTGAA CGCTGCT 17 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
TGCACTTGAC GCATGCT 



17 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 17 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGCACTTGAC GCATGCT 
(2 |U INFORMATION FOR SEQ ID NO: 6: 

CO (i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 17 
JM (B) TYPE: nucleic acid 

** q (C) STRANDEDNESS: double 

*£= (D) TOPOLOGY: linear 

□ (ii) MOLECULE TYPE: DNA (genomic) 

ptei) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TGCCTTGAAC GCATGCT 

17 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 2686 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA 
CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG 120 
TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA GCAGATTGTA CTGAGAGTGC 



180 



ACCATATGCG 


GTGTGAAATA 


CCGCACAGAT 


GCGTAAGGAG 


AAAATACCGC 


ATCAGGCGCC 


240 


ATTCGCCATT 


CAGGCTGCGC 


AACTGTTGGG 


AAGGGCGATC 


GGTGCGGGCC 


TCTTCGCTAT 


300 


TACGCCAGCT 


GGCGAAAGGG 


GGATGTGCTG 


CAAGGCGATT 


AAGTTGGGTA 


ACGCCAGGGT 


360 


TTTCCCAGTC 


ACGACGTTGT 


AAAACGACGG 


CCAGTGCCAA 


GCTTGCATGC 


CTGCAGGTCG 


420 


ACTCTAGAGG 


ATCCCCGGGT 


ACCGAGCTCG 


AATTCGTAAT 


CATGGTCATA 


GCTGTTTCCT 


480 


GTGTGAAATT 


GTTATCCGCT 


CACAATTCCA 


CACAACATAC 


GAGCCGGAAG 


CATAAAGTGT 


540 


AAAGCCTGGG 


GTGCCTAATG 


AGTGAGCTAA 


CTCACATTAA 


TTGCGTTGCG 


CTCACTGCCC 


600 


GCTTTCCAGT 


CGGGAAACCT 


GTCGTGCCAG 


CTGCATTAAT 


GAATCGGCCA 


ACGCGCGGGG 


660 


AGAQGCGGTT 


TGCGTATTGG 


GCGCTCTTCC 


GCTTCCTCGC 


TCACTGACTC 


GCTGCGCTCG 


720 


GTdSTTCGGC 


TGCGGCGAGC 


GGTATCAGCT 


CACTCAAAGG 


CGGTAATACG 


GTTATCCACA 


780 


gaaJ&agggg 


ATAACGCAGG 


AAAGAACATG 


TGAGCAAAAG 


GCCAGCAAAA 


GGCCAGGAAC 


840 


CGTpAAAGG 


CCGCGTTGCT 


GGCGTTTTTC 


CATAGGCTCC 


GCCCCCCTGA 


CGAGCATCAC 


900 


aaaMtcgac 


GCTCAAGTCA 


GAGGTGGCGA 


AACCCGACAG 


GACTATAAAG 


ATACCAGGCG 


960 


TTT§CCCCTG 


GAAGCTCCCT 


CGTGCGCTCT 


CCTGTTCCGA 


CCCTGCCGCT 


TACCGGATAC 


1020 


CTGiftSCGCCT 


TTCTCCCTTC 


GGGAAGCGTG 


GCGCTTTCTC 


AAAGCTCACG 


CTGTAGGTAT 


1080 


CTCAlTTCGG 


TGTAGGTCGT 


TCGCTCCAAG 


CTGGGCTGTG 


TGCACGAACC 


CCCCGTTCAG 


1140 


CCCGACCGCT 


GCGCCTTATC 


CGGTAACTAT 


CGTCTTGAGT 


CCAACCCGGT 


AAGACACGAC 


1200 


TTATCGCCAC 


TGGCAGCAGC 


CACTGGTAAC 


AGGATTAGCA 


GAGCGAGGTA 


TGTAGGCGGT 


1260 


GCTACAGAGT 


TCTTGAAGTG 


GTGGCCTAAC 


TACGGCTACA 


CTAGAAGAAC 


AGTATTTGGT 


1320 


ATCTGCGCTC 


TGCTGAAGCC 


AGTTACCTTC 


GGAAAAAGAG 


TTGGTAGCTC 


TTGATCCGGC 


1380 


AAACAAACCA 


CCGCTGGTAG 


CGGTGGTTTT 


TTTGTTTGCA 


AGCAGCAGAT 


TACGCGCAGA 


1440 


AAAAAAGGAT 


CTCAAGAAGA 


TCCTTTGATC 


TTTTCTACGG 


GGTCTGACGC 


TCAGTGGAAC 


1500 


GAAAACTCAC 


GTTAAGGGAT 


TTTGGTCATG 


AGATTATCAA 


AAAGGATCTT 


CACCTAGATC 


1560 


CTTTTAAATT 


AAAAATGAAG 


TTTTAAATCA 


ATCTAAAGTA 


TATATGAGTA 


AACTTGGTCT 


1620 


GACAGTTACC 


AATGCTTAAT 


CAGTGAGGCA 


CCTATCTCAG 


CGATCTGTCT 


ATTTCGTTCA 


1680 


TCCATAGTTG 


CCTGACTCCC 


CGTCGTGTAG 


ATAACTACGA 


TACGGGAGGG 


CTTACCATCT 


1740 


3GCCCCAGTG 


CTGCAATGAT 


ACCGCGAGAC 


CCACGCTCAC 


CGGCTCCAGA 


TTTAT CAGCA 


1800 



ATAAACCAGC 


CAGCCGGAAG 


GGCCGAGCGC 


AGAAGTGGTC 


CTGCAACTTT 


ATCCGCCTCC 


1860 


ATCCAGTCTA 


TTAATTGTTG 


CCGGGAAGCT 


AGAGTAAGTA 


GTTCGCCAGT 


TAATAGTTTG 


1920 


CGCAACGTTG 


TTGCCATTGC 


TACAGGCATC 


GTGGTGTCAC 


GCTCGTCGTT 


TGGTATGGCT 


1980 


TCATTCAGCT 


CCGGTTCCCA 


ACGATCAAGG 


CGAGTTACAT 


GATCCCCCAT 


GTTGTGCAAA 


2040 


AAAGCGGTTA 


GCTCCTTCGG 


TCCTCCGATC 


GTTGTCAGAA 


GTAAGTTGGC 


CGCAGTGTTA 


2100 


TCACTCATGG 


TTATGGCAGC 


ACTGCATAAT 


TCTCTTACTG 


TCATGCCATC 


CGTAAGATGC 


2160 


TTTT CTGTGA 


CTGGTGAGTA 


CTCAACCAAG 


TCATTCTGAG 


AATAGTGTAT 


GCGGCGACCG 


2220 


AGTTGCTCTT 


GCCCGGCGTC 


AATACGGGAT 


AATACCGCGC 


CACATAGCAG 


AACTTTAAAA 


2280 


gtgStcatca 


TTGGAAAACG 


TTCTTCGGGG 


CGAAAACTCT 


CAAGGATCTT 


ACCGCTGTTG 


2340 


agaSccagtt 


CGATGTAACC 


CACTCGTGCA 


CCCAACTGAT 


CTTCAGCATC 


TTTTACTTTC 


2400 


AC(i||GCGTTT 


CTGGGTGAGC 


AAAAACAGGA 


AGGCAAAATG 


CCGCAAAAAA 


GGGAATAAGG 


2460 


GCGKpACGGA 


AATGTTGAAT 


ACTCATACTC 


TTCCTTTTTC 


AATATTATTG 


AAGCATTTAT 


2520 


CAGgGTTATT 


GTCTCATGAG 


CGGATACATA 


TTTGAATGTA 


TTTAGAAAAA 


TAAACAAATA 


2580 


ggg||ttccgc 


GCACATTTCC 


CCGAAAAGTG 


CCACCTGACG 


TCTAAGAAAC 


CATTATTATC 


2640 


ATGA&ATTAA 


CCTATAAAAA 


TAGGCGTATC 


ACGAGGCCCT 


TTCGTC 




2686 



(2) Information for seq id no:8: 
(i) sequence characteristics: 

(A) LENGTH: 66 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GTGCCAAGCT TGCATGCCTG CAGGTCGACT CTAGAGGATC CCCGGTACCG AGCTCGAATT 60 
CGTAAT 66 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



. (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
AAGCTT 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 
q (B) TYPE: nucleic acid 

iB (C) STRANDEDNESS : double 

Cj (D) TOPOLOGY: linear 

{5(ii) MOLECULE TYPE: DNA (genomic) 



J: 3 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
GCAjfec 

(2) Information for seq id no-.ii: 

{t ~ (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
CTGCAG 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 
■(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
GGTACC 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

yj(ii) MOLECULE TYPE: DNA (genomic) 



|J(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
TCT^GA 

(2) Information for seq id no:14: 
sequence characteristics : 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 




(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GTCGAC 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
GTCGAC 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
CCCPGG 

(2 ^INFORMATION FOR SEQ ID NO: 17: 

;L (i) SEQUENCE CHARACTERISTICS: 
y (A) LENGTH: 6 

LH (B) TYPE: nucleic acid 

\if (C) STRANDEDNESS: double 

% (D) TOPOLOGY: linear 

H(ii). MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
GAATTC 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 



CCCGGG 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



Q(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

gaatJc 



